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ABSTRACT 


Games are natural models for multi-agent machine learning set- 
tings, such as generative adversarial networks (GANs). The desir- 
able outcomes from algorithmic interactions in these games are 
encoded as game theoretic equilibrium concepts, e.g. Nash and 
coarse correlated equilibria. As directly computing an equilibrium 
is typically impractical, one often aims to design learning algo- 
rithms that iteratively converge to equilibria. A growing body of 
negative results casts doubt on this goal, from non-convergence 
to chaotic and even arbitrary behaviour. In this paper we add a 
strong negative result to this list: learning in games is Turing com- 
plete. Specifically, we prove Turing completeness of the replicator 
dynamic on matrix games, one of the simplest possible settings. 
Our results imply the undecicability of reachability problems for 
learning algorithms in games, a special case of which is determining 
equilibrium convergence. 
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1 INTRODUCTION 


Many multi-agent machine learning settings can be modeled as 
games, from social or economic systems with algorithmic decision- 
makers to popular learning architectures such as generative ad- 
versarial networks (GANs). Desired outcomes in these settings are 
often encoded as equilibrium concepts, and therefore a primary 
goal is identifying machine learning algorithms with provable con- 
vergence to these equilibria. 

While there has been progress in deriving strong time-average 
convergence guarantees for popular online learning algorithms, 
the per-iteration behaviour of learning in games remains elusive. 
Recent results attempt to formalize how elusive these dynamics 
can be, from non-convergence results to establishing chaotic, or 
even essentially arbitrary, behaviour [1, 3, 8, 10, 15]. Experiments 
confirm that chaos can actually be typical behaviour [21]. 

In this work, we add an even more sobering negative result to 
this list: learning in games is Turing complete. Specifically, we 
show that replicator dynamics in matrix games, one of the sim- 
plest possible settings, can simulate an arbitrary Turing machine 
(Theorem 1). Here simulation is defined in terms of reachability, a 
natural decision problem for dynamical systems that asks whether 
a given system and initial condition eventually intersects (reaches) 
a certain set; a dynamical system simulates a Turing machine if the 
corresponding halting problem reduces to the reachability problem. 
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Our proof combines two recent results, on the Turing complete- 
ness of fluid dynamics [5], and on the approximate universality of 
learning in games [1]. 

We believe our results have far-reaching implications for the 
literature on learning in games. Most immediate is the fact that 
the reachability problem is undecidable for no-regret learning in 
general (Corollary 3). This result calls into question the feasibility 
of equilibration as a goal, since even deciding whether a learning 
algorithm gets close to an equilibrium is a special case of reachabil- 
ity. More broadly, these results establish the computational power 
of learning dynamics in games—and accordingly, their inherent 
complexity as formalized by computabiity theory. 

Beyond the continuous-time setting, we borrow tools from nu- 
merical analysis to show that the multiplicative weights algorithm 
can simulate any bounded Turing machine (Theorem 2). Extending 
this analysis to arbitrary Turing machines, and thus establishing 
Turing completeness for the discrete-time setting, may not be pos- 
sible with the techniques we consider. Establishing (or refuting) the 
Turing completeness of multiplicative weights is therefore left as an 
important open question, and one that will likely require entirely 
new techniques. 


2 PRELIMINARIES 
2.1 Matrix Games 


A finite n-player normal form game consists of n agents [n] = 
{1,...,n}, where each agent i € [n] can choose actions from a finite 
action set S;. Actions are chosen by agent i according to a mixed 
strategy, a distribution x; in the probability |S;|-simplex Alsil = 
{xj € RIS : Vises; Xis = 1}. In normal form games, agents receive 
payoffs from pairwise interactions according to payoff matrices 
Aij € RIS:IxIS;| where ij € [n] and i # j. Given that mixed 
strategies xj € AlSil and xj € AlSil are chosen, agent i receives 
payoff x] Aj jx j. These payoffs yield a natural optimization problem 
for each agent, where agents act strategically and independently 
to maximize their expected payoff over the other agents’ mixed 
strategies, i.e. 


max » x} Ai, jXj, ic [n]. (1) 


EASi 
KEAS efn] ji 
Throughout the paper we'll restrict our attention to the case known 
as matrix games, when n = 2. 


2.2 Follow-the-Regularized-Leader (FTRL) 
Learning and Replicator Dynamics 

In many game settings, the optimization in eq. (1) is a moving target 

since the opponent adaptively updates their strategy and the payoff 


matrix may be unknown. In such settings, arguably the most well 
known class of algorithms is Follow-the-Regularized-Leader (FTRL). 


The continuous-time version of an FTRL algorithm is as follows. 
Given initial payoff vector y; (0), an agent i that plays against agent 
jin a matrix game A; ; updates their strategy at time t according to 


t 
y(t) = y;(0) +f Aj jx; (s)ds 
0 
x;(t) = arg max{(xj, y;(t)) — hi(xi)} 


Xj EAISil 


(2) 


where hj is strongly convex and continuously differentiable. FTRL 
effectively performs a balancing act between exploration and ex- 
ploitation. The cumulative payoff vector y;(t) indicates the total 
payouts until time t, i.e. if agent i had played strategy s; € S; contin- 
uously from t = 0 until time t, agent i would receive a total reward 
of Yis; (t). The two most well-known instantiations of FTRL dynam- 
ics are the online gradient descent algorithm when h;(x;) = ||x;| 12, 
and the replicator dynamics (the continuous-time analogue of Mul- 
tiplicative Weights Update [2]) when hj(x;) = Dis,es, Xis; In Xis; 
FTRL dynamics in continuous time has bounded regret in arbitrary 
games [17]. For more information on FTRL dynamics and online 
optimization, see [24]. 

In this paper, we will focus on replicator dynamics (RD) as the 
learning process generating game dynamics. In addition to its role in 
optimization, replicator dynamics is the prototypical dynamic stud- 
ied in evolutionary game theory [22, 30] and is one of the key math- 
ematical models of evolution and biological competition [23, 28]. 
In this context, replicator dynamics can be thought of as a normal- 
ized form of ecological population models, and is studied given a 
single payoff matrix A and a single probability distribution x that 
can be thought abstractly as capturing the proportions of different 
species/strategies in the current population. Species/strategies get 
randomly paired up and the resulting payoff determines which 
strategies will increase/decrease over time. 

Formally, the dynamics are as follows. Let A € R™™ be a matrix 
game and x € A” be the mixed strategy played. RD on A are given 
by: 

dxi 


ži = ao ((Ax)j — x™ Ax), 


Under the symmetry of Aj; = Aji, and of initial conditions (ie. 
xj = x; at t = 0), it is immediate to see that under the xi, x; 
solutions of eq. (2) are identical to each other and to the solution 
of eq. (3) with A = Aj; = Ajj. For our purposes, it will suffice to 
focus on exactly this setting of matrix games defined by a single 
payoff matrix A and a single probability distribution x, which is 
actually the standard setting within evolutionary game theory. 


i€ [m] (3) 


2.3 Dynamical Systems Theory 


A dynamical system is a mathematical model of a time-evolving 
process. The objects undergoing change in a dynamical system 
is called its state and is often denoted by x € X, where X is a 
topological space called a state space. For most of this paper we will 
be focusing on continuous time systems, but in §5 we will consider 
discrete time systems derived from numerical approximations of 
their continuous counterpart. To distinguish between continuous 
and discrete time, we will use x(t) to describe the state as a function 
of continuous time t € R and x! to describe the state as a function 
of discrete time t € Z. 


Change between states in a continuous time dynamical system 
is described by a flow ® : X x R —> X satisfying two properties: 


(i) For each t € R, ®(., t) : X — X is bijective, continuous, and 


has a continuous inverse. 
(ii) For every s,t € Rand x € X, ®(x,s + t) = ®(®(x, t), s). 


Intuitively, flows describe the evolution of states in the dynamical 
system. Given a time t € R, the flow gives us the relative movement 
of every point x € X; we will denote this by the map ® : X > X. 
Similarly, given a point x € X, the flow captures the trajectory of x 
as a function of time; in an abuse of notation, we will denote this 
by x(t) where t is changing. 

Continuous time dynamical systems are often given as systems 
of ordinary differential equations (ODEs). Systems of ODEs describe 
a vector field V : X — TX which assigns to each x € X a vector 
in the tangent space of X at x. The unit sphere S” = {x € R™!: 
lxi = 1} will play a special role in proving Theorem 1, in which 
case the tangent space TS” at each x € S” is {y € R” : x-y = 
0}. Intuitively, the tangent space defines bundles of vectors that 
ensure the system’s states remain well defined on the state space 
as time progresses. A system of ODEs is said to generate (or give) 
a flow © if ® describes a solution of the ODEs at each point x € 
X. Throughout this paper we assume that all dynamical systems 
discussed can be given by a system of ODEs. For this reason, we 
will use the term dynamical system to refer to the system of ODEs, 
the associated vector field, and a generated flow interchangeably. 
A well known result in dynamical systems theory states that, for 
Lipschitz-continuous systems of ODEs, the generated flow is unique 
(see [16, 19]) and using these terms interchangeably is well defined. 

An important notion for proving Theorem 1, and for dynamical 
systems in general, is that of a global attracting set of the dynamical 
system. Let ® be a flow generated by some dynamical system on 
X. We say Y C X is forward invariant for the flow © if (y) € Y 
for every t > 0, y € Y. We say Y C Xis globally attracting for the 
flow © if Y is nonempty, forward invariant, and 


Y2 (E :xceX}. (4) 


t>0 


Stated informally, if Y is globally attracting it will eventually cap- 
ture the dynamics of ® starting from any point in X after some 
transitionary period of time. 

Now let X and Y be two topological spaces. We say that a func- 
tion f : X — Y is a homeomorphism if (i) f is bijective, (ii) f is 
continuous, and (iii) f has a continuous inverse. Furthermore, two 
flows ®: X x R —> X and ¥ : Y x R —> Y are homeomorphic if 
there exists a homeomorphism g : X — Y such that for each x € X 
and t € R we have g(®(x, t)) = ¥(g(x), t). If g is also C! and has a 
C! inverse, then we say g is a diffeomorphism and that the flows ® 
and ¥ are diffeomorphic. Observe that every diffeomorphism is also 
a homeomorphism, and thus every pair of diffeomorphic flows are 
also homeomorphic. Homeomorphic (resp. diffeomorphic) flows 
satisfy a strong, and typical, notion of equivalence between dynam- 
ical systems. Intuitively, two dynamical systems are homeomorphic 
if their trajectories can be mapped to one another by stretching and 
bending space. 


2.4 Turing Machines 


Throughout this paper we rely crucially on the notion of a Turing 
complete dynamical systems, i.e. a dynamical system able to simu- 
late any Turing machine. We will briefly recall the Turing machine 
model and formalize its relationship with dynamical systems. 

A Turing machine is given by a tuple T = (Q, È, ô, qo, qhalt) Where 


e Q is a finite set of states, including an initial state qg anda 
halting state qhalt; 
e © is an alphabet with cardinality at least two; 
e ô: QX — QXxĒÈx {-1,0, 1} is a transition function. 
For a given Turing machine T and an input tape s = (si)iez € yz 
the Turing machine’s computation is carried out according to the 
following process: 


[0 


Initialize the current state q to qo, and the current tape 
w = (wi)iez to be s. 

[1] Ifq = qnart then halt the algorithm and return w as output. 
Otherwise compute 6(q, wo) = (q', wg, o), where o € 
{-1,0,1}. 


[2] Update the current state and tape by setting q = q’ and 
the 0' position of w to wo = Wo: 
[3] Update w with the o shifted tape (wj;,), then return 


to [1]. 

Without loss of generality, we will assume that Turing machines 
adhere to standard simplifying conventions (cf. [26]). Specifically, 
we assume that the alphabet = {0,...,9} and any given tape of 
the Turing machine only has a finite number of symbols different 
from 0, where 0 represents the special “blank symbol”. Under these 
assumptions it follows that there exists a finite (possibly large) 
integer ko > 0 such that any tape w satisfies 


w = (wi)iez =... 000W- ko - - - wx, 000... (5) 


with each w; € È. Equivalently, at any given step in the Turing 
machine’s evolution, these assumptions ensure there can be at most 
2ko + 1 non-blank symbols on the tape. In particular, we get that the 
space of configurations of a Turing machine T is Q x A C Q x ¥%, 
where A is the subset of strings taking the form (5). 

The construction of dynamical systems that simulate Turing ma- 
chines is at the heart of our results, and has been studied for various 
problems in physics [6, 9, 20]. Although equivalent definitions exist, 
our analyses will adopt the formalisms used by recent work on fluid 
dynamics [5, 27]. An analogous definition can be given for flows 
on a manifold. 


DEFINITION 1. A vector field X on a manifold M simulates a 
Turing machine T if there exists an explicitly constructible open set 
Uw... wp C M corresponding to each finite string w_y,..., we € È, 
and an explicitly constructible point ps € M corresponding to each 
s € XŽ, such that: T with input tape s halts with an output tape 
having values w_x,..., wx in positions —k,...,k respectively if and 
only if the trajectory of X through ps intersects Uw_,,....wg- 


Intuitively, a dynamical system simulates a Turing machine if 
there is a correspondence between trajectories reaching certain sets 
and computations halting with certain configurations. In particular, 
constructing the point p, depends only on the Turing machine T 
and input tape s, while constructing the set Uy _,,...., depends 
only on the specified halting configuration of T. Both here and 


throughout the paper, we say a mathematical object (e.g. points, 
sets, or matrices) is constructible if it can be computed in finite 
time; constructability is not explicitly used in our arguments, but 
is important for nuanced technical reasons since it disallows patho- 
logical scenarios such as having all information about a machine’s 
computations being encoded in an initial condition. 

Definition 1 leads to a natural notion of Turing completeness for 
dynamical systems. 


DEFINITION 2. A dynamical system is Turing complete if it can 
simulate a universal Turing machine T. 


3 TURING COMPLETE DYNAMICS ON 
MATRIX GAMES 


Our goal in this section is to establish the Turing completeness 
of replicator dynamics; in §3.1 we provide all precursory results 
required to prove the main result in §3.2. 


3.1 Turing Complete Vector Fields and 
Approximation-Free Game-Embeddings 


Our construction of Turing complete game dynamics relies crucially 
on the notion of generalized Lotka-Volterra (GLV) vector fields. In 
particular, two properties of GLV vector fields will play a key role 
in the proof: (i) polynomial vector fields on RẸ, are a special case 
of GLV vector fields, and (ii) GLV vector fields can be embedded 
into RD on a matrix games without approximation. 

Formally, a GLV vector field is a vector field on RẸ, given by the 
system of ODEs 


. 5 B; 7 
xj = or =xj;|Aj+ ps Aij [| x ; ie[n] (©) 
jelm] ke[n] 


m 


where m is some positive integer, 1 € R”, A € R™™, and B € R™". 
Since exponents given by B can be any real number, the terms in 
the parentheses are multivariate generalized polynomials. In spe- 
cial cases where the ODEs are standard multivariate polynomi- 
als, GLV vector fields equate to polynomial vector fields—a fact 
straightforwardly shown by noting that any polynomial vector 
field P = {Pi}ie{n] on R}, is equivalent to the GLV vector field 
P = {xi($Pi}ie(ny- 

Polynomial and GLV vector fields play an integral role by allow- 
ing us to invoke recent results by [5] and [1]. The starting point of 
our construction can stated as follows: 


PROPOSITION 1 (THEOREM 4.1 OF [5]). There exists a constructible 
polynomial vector field X of degree 58 on S!” which is Turing complete 
and bounded. 


In Appendix A we provide a proof sketch of this result; we refer 
the reader to [5] for the full proof. In §3.2 we will extend the Turing 
completeness from Proposition 1 to replicator dynamics in matrix 
games by leveraging recent work by [1]. In essence, [1] showed 
that GLV vector fields can approximate essentially any dynamical 
system, and that any GLV vector field can be embedded into the 
dynamics of RD on some matrix game. In this paper we only rely on 
the latter result, since polynomial vector fields are already a special 
case of GLV vector fields and thus do not need to be approximated. 


PROPOSITION 2 (THEOREM 3 OF [1]). Let P be a GLV vector field 
on R}, and ® be the flow generated by P. Form > n, there exists a 
flow® onrelint(A™) and a constructible diffeomorphism f : R}, > 
P C relint(A”™) such that: 

(i) The flow © on relint(A™) is given by RD on a matrix game 
with payoff matrix A e R®™., 

(ii) The flow elp = f(®) and® = f Olp). where Q|, is the flow 

given by © restricted to P. 
(iii) The integer m — 1 is at least the number of unique monomials 
in P. 

At a high level, proving Proposition 2 boils down to composing 
an embedding trick introduced by [4] with Theorem 7.5.1 by [13]. 
The relationship highlighted here between m — 1 and the number 
of monomials was not included in the original statement by [1], 
however it is shown as part of an important step in their proof and 
is required for Corollary 1. 


3.2 Replicator Dynamics on Matrix Games is 
Turing Complete 
To prove the main result of this section, Theorem 1, we will derive 


a diffeomorphism of the Turing complete vector field constructed 
in Proposition 1 that enables us to apply Proposition 2. 


THEOREM 1. There exists m > 0 and a constructible matrix game 
AéR™*™ such that replicator dynamics on A is Turing complete. 


Proor. Let X be the Turing complete polynomial vector field on 
S17 given by Proposition 1. We begin by embedding X into a poly- 
nomial vector field X on R!8 where S" is globally attracting. Since 
trajectories of X are globally attracted to S17, a standard change of 
coordinates via translation yields a polynomial vector field that is 
well-defined on R18. Therefore, as polynomial vector fields on R18. 
are a special case of GLV vector fields, we will conclude the proof 
by applying Proposition 2 from §3.1. 

Let {¢itie[1g} be the set of polynomials given by X. Define 
a(x) =(1- Ilx|I3) for x € R!8. Now define X as the vector field on 
R'8 given by the system 


ži = Xj (rw + Zg) 


= xin(x) + ġi(x) , 


for each i € [18]. By construction S!” is forward invariant under X, 
as s(x) = 0 on S1 and X is forward invariant on S1”. Furthermore, 
observe that for x = x(t) € R! the solutions of X satisfy 


d 
lela =2 >) xti 
ie [18] 


=2 3 x? n(x) + > xii (x) 


ie [18] ie [18] 


= 2n(x) » ie +2 » xifi(x) 


ie [18] ie [18] 


= 2z (x)||xll3 


2 2 
= ajfxll3 (1 - x12) . 


since, by definition of TS!’, the constraint ixl = 1 ensures X 
satisfies 


2 ` xij(x) =0. 


ie [18] 


The term 2||xll3 (1 - IIx113) is a logistic equation in lx]. Thus, for 
every x € R}8, we know ixl — 1as t —> œ. It follows that S!” is 
globally attracting for the trajectories generated by X. 

Denote a standard translation of axes by ø € Ras Fo : R!8 = 
R!8, x > x + dl, where 1 is the all-ones vector. Since solutions 
of X are attracted to S1” and Proposition 1 ensures {itie[is] is 
bounded due to the reparametrization done in eq. (4.2) in [5], there 
exists suitable values of ø such that composing Fo with X yields a 
polynomial vector field that is forward invariant on R18 . Formally, 
let B > 0 be the bound on {¢;}je[18] given in Proposition 1, i.e. for 
all i € [18] and x € R'® the vector field X satisfies |¢;(x)| < B. To 
ensure the translated vector field is forward invariant on RS; it 
suffices to find ø such that Y = F, o X is strictly positive on the 
boundary when y € R! has y; = 0 for some i € [18]. By definition 
we know that Y at any y € RIS 
system of equations {¥i}j¢[18] is given by the system of equations 
{Xi}ie[ig] under the substitution x = y — o1. Therefore we find 


that, for y € R18 with y; = 0 for some i € [18], 


is identical to X at x = y—oll. The 


yi = (yi — o)a(y — 01) + ġi(y — c1) 
> (-0)(1 - lly - o1||3) - B 
=olly - alli -o-B 


>o-o-B, 


which implies 4; > 0 whenever B < —o + o°. Thus, for values of o 
satisfying B < —o + 0°, we have Y = Fg o X which is well defined 
on RÍŠ for all initial conditions in R18. 

By definition of Y, as a translated copy of X, the set Fo (S!) is 
globally attracting in Y, and y| F, (S1) is a Turing complete poly- 
nomial vector field. It follows we have constructed a polynomial 
vector on R}8 that inherits the Turing complete dynamics of X. 
Since polynomial vector fields on R}8 are a special case of GLV vec- 
tor fields on R18, from Proposition 2 there exists a diffeomorphism 
f: RI = P C relint(A”) from trajectories of Y onto trajectories 
of an invariant submanifold of replicator dynamics on a matrix 
game A € RM, 

We conclude by showing how the Turing completeness of X 
corresponds to Turing completeness for replicator dynamics on A. 
Suppose we have a given Turing machine T, an input tape s, and 
some finite string w. By Proposition 1 there exists a point ps and 
open set U,) such that trajectories of X through ps intersect Uw 
if and only if T halts with input s and output matching w about 


the machine’s head. Our analysis above shows that X Wiis X, so 


trajectories of X through ps intersect Uo if and only if T halts 
with input s and output matching w. Therefore, after translating 
X, we know trajectories of Y through F,(ps) intersect Fo (Uw) if 
and only if T halts with input s and output matching w. Finally, 
since diffeomorphisms are closed under composition, we conclude 
that trajectories of replicator dynamics on A through the point 
f (Fo(ps)) intersect the set f(Fo(U.))) if and only if T halts with 


input s and output matching w, where f is the diffeomorphism 
above. Thus, on an invariant submanifold of relint(A”’), replicator 
dynamics on A simulates T. Taking T to be a universal Turing 
machine completes the proof. o 


An interesting corollary of Theorem 1 is that we arrive at a 
bound on the number of actions needed for defining games where 
learning dynamics can be Turing complete. The bound is likely 
loose for several reasons. Firstly, the polynomial vector field from 
Proposition 1 is not known to have minimal degree nor dimension. 
Secondly, the combinatorial argument in Appendix B makes no 
attempt at a nuanced count on the number of unique monomials in 
the polynomials given by these vector fields. Deriving a tight bound 
is not only an interesting open question for game dynamics, but also 
for recent work in fluid dynamics [5, 29] and analog computing [12]. 


CoroLLARY 1. For some m < ($) + 1, there exists a matrix game 
AéR™*™ such that replicator dynamics on A is Turing complete. 


4 UNDECIDABLE PHENOMENA IN 
NO-REGRET LEARNING DYNAMICS 


The Turing completeness of replicator dynamics (i.e. Theorem 1) 
has deep implications for machine learning and, more generally, 
learning in strategic environments. Specifically, if a dynamical sys- 
tem simulates a Turing machine, Definition 1 gives a reduction 
from the halting problem for Turing machines to the reachability 
problem for dynamical systems, which we use alongside the Turing 
completeness established in Theorem 1 to uncover the existence 
of undecidable reachability problems. As will be discussed in §4.2, 
the existence of undecidable problems makes it increasingly impor- 
tant that we understand computability in instances of reachability 
arising from fundamental solution concepts for game theory and 
machine learning. 


4.1 The Halting and Reachability Problems 


The halting problem is a prototypical decision problem for Turing 
machines and is arguably the most famous undecidable problem in 
computer science. Given a Turing machine T and an input tape, the 
halting problem for T asks whether or not T will halt. By contrast, 
the reachability problem is canonical for dynamical systems and 
has been studied in various control settings; given a dynamical 
system X and a set of initial conditions, the reachability problem for 
X asks whether or not X’s trajectory will intersect a predetermined 
set. Although the computability of the halting problem is generally 
well understood in Turing machines, the computability of the reach- 
ability problem has not traditionally been studied in the context of 
game dynamics. However, from the strong equivalence between 
halting and reachability required by Definition 1, we immediately 
get a reduction between these classic decision problems. 


PROPOSITION 3. If a dynamical system X simulates a Turing ma- 
chine T, then the halting problem for T reduces to the reachability 
problem for X. 


The proof of this proposition follows directly from Definition 1, 
since checking whether the dynamical system reaches a set be- 
comes equivalent to checking whether the Turing machine halts 
by definition. From Theorem 1 we know that replicator dynamics 


on a matrix game can simulate a universal Turing machine. There- 
fore, due to the undecidability of the halting problem in general, 
we deduce that the reachability problem can be undecidable for 
replicator dynamics on matrix games. 


COROLLARY 2. There exist matrix games where the reachability 
problem is undecidable for replicator dynamics. 


The corollary follows immediately from Proposition 3 and Theo- 
rem 1, since the undecidability of the halting problem for universal 
Turing machines uncovers the undecidability of the reachability 
problem for replicator dynamics on matrix games. 


4.2 Implications for No-Regret Learning in 
Games 


Games are primarily understood and studied via equilibrium con- 
cepts, e.g. Nash equilibria, evolutionary stable strategies, and coarse 
correlated equilibria. It is therefore unsurprising that a central goal 
of learning in games is often to converge on some set of equilibria. 
Yet, beyond certain special cases (e.g. potential games), learning 
behaviours remain largely enigmatic and there has been limited 
progress towards resolving non-convergence in general settings. 
The results in this paper may explain why: determining conver- 
gence to a set of equilibria is a special case of reachability, and 
identifying learning algorithms that provably converge on such a 
set may be an undecidable problem even in very simple classes of 
games. The goal of this section is to formalize this intuition. 

In Corollary 2 we found that reachability can be undecidable for 
replicator dynamics on matrix games. Therefore, taken as a negative 
result, Corollary 2 implies that undecidable trajectories can exist 
in larger classes of game dynamics where replicator dynamics on 
matrix games is a special case. Unfortunately, replicator dynamics 
is special case of FTRL dynamics and no-regret learning dynamics 
more generally [17], which suggests these popular learning dynam- 
ics can inherit the negative result on any class of games containing 
matrix games. Similarly, matrix games are very restricted and a 
special case of many popular classes of games, e.g. normal form 
and smooth games. As an example of how broadly these results 
generalize, matrix games in the FTRL framework describe qua- 
dratic objective functions and thus undecidable trajectories exist 
for optimization-driven learning over quadratic objectives. Thus, 
as Corollary 2 holds for replicator dynamics on matrix games, we 
arrive at the reachability problem being generally undecidable for 
rich classes of game dynamics studied in the literature and used in 
practice. 


COROLLARY 3. There exist games where reachability for no-regret 
learning dynamics is undecidable. 


In light of Corollary 2, the claim follows from our discussion 
above. As determining convergence to sets of game theoretic solu- 
tion concepts is a special case of the reachability problem, Corol- 
lary 3 reveals that determining whether game dynamics converge 
to fundamental solution concepts is undecidable in general. It is 
important to note that the undecidability may not hold for specific 
games, learning dynamics, or solution concepts; the primary take- 
away is that undecidability is possible and has strong implications 
about how we should approach these important questions. 


5 DISCRETE LEARNING DYNAMICS AND 
TURING MACHINE SIMULATIONS 


Thus far we have focused on continuous-time replicator learning 
dynamics, but in practice discrete-time learning dynamics are typi- 
cally used. A folk result in the study of game dynamics states that 
the multiplicative weights update (MWU) algorithm is essentially 
an Euler discretization of replicator dynamics. It is therefore natural 
to ask whether MWU, the discrete analogue of replicator dynamics, 
are also Turing complete. Unfortunately, as will be shown in this 
section, standard numerical error analyses are likely insufficient 
for proving Turing completeness in discrete time; intuitively, the 
reason is because discretizations of a continuous time process will 
yield error bounds that grow as a function of time. We will formalize 
these error bounds in §5.1 and use them in §5.2 to begin untangling 
the computational power of MWU. Discussions of related open 
questions are left for §6. 


5.1 Discretization Error of Multiplicative 
Weights Updates 


The fact that MWU is a discretization of replicator dynamics is 
well known in the field of game dynamics, but a precise derivation 
of this relationship is often omitted. For clarity in our analysis of 
discretization errors, we will highlight one possible discretization 
that reveals MWU as a discrete-analogue of replicator dynamics in 
Appendix C. The discretization we arrive at is used to find a bound 
on the cumulative error of MWU relative to replicator dynamics, 
which is crucial for the analyses and discussion to follow. 


LEMMA 1. Let ® be the flow generated by replicator dynamics and 
x! be the mixed strategy found on the t'h iterate of MWU. The error 
accrued by a single iteration of MWU with step-size n > 0 is 


IIx!" — (7, xlo < 1-2", 


The proof of Lemma 1 consists of relatively straightforward cal- 
culations, but requires carefully handling nonlinearities introduced 
by MWU; a full proof is included in Appendix D. 

Using Lemma 1 as a basis, we can bound the error accrued over 
multiple iterations of MWU.! 


LEMMA 2. Let ® be the flow generated by replicator dynamics and 
x! be the mixed strategy found on the t'h iterate of MWU. The error 
accrued after t + 1 iterations of MWU with step-size n > 0 is 


lx+! - (tn, Xl < O (e) ; 


A full derivation of Lemma 2 is found in Appendix E, and follows 
from using Lemma 1 alongside standard techniques for bounding 
error in iterated numerical methods. 


5.2 Simulating Bounded Turing Machines with 
Multiplicative Weights Update 


The result in Lemma 2 shows that, relative to replicator dynamics, 
the error accrued by MWU will grow with the number of iterations. 
Error growing as a function of time is problematic when simulating 
a Turing machine by using MWU as discretization of replicator 
dynamics. 


1n the language of numerical analysis, Lemma 1 gives the local error used to find the 
global error in Lemma 2. 


Recall that in Theorem 1 we showed that replicator dynamics 
can simulate a universal Turing machine because it can embed a 
dynamical system that simulates a universal Turing machine, which 
is done to ensure the Turing machine’s halting remains equivalent 
to the dynamics’ trajectories reaching a certain set. However, in 
general, determining whether such a Turing machine will halt or 
how many steps are required to halt is undecidable. Therefore, with- 
out an a priori bound on the maximum amount of time needed to 
determine whether the machine halts or not, we cannot choose 
step sizes for MWU that guarantee the discretization remains suffi- 
ciently close to replicator dynamics when intersecting the relevant 
set. 


THEOREM 2. Let k > 0 be a finite integer and Ty be the set of 
Turing machines that we can determine to halt or not after k steps of 
computation. For any k, there exists step sizes yn > 0 such that MWU 
with step-size n can simulate any Turing machine in Ty. 


The result follows from the construction of the open sets used in 
Proposition 1 and the fact that we can ensure MWU’s discretization 
error stays sufficiently small over any finite window of time due to 
Lemma 2. Resolving the limitations of Theorem 2, and uncovering 
the true computational power of discrete algorithms such as MWU, 
will likely require new technical approaches for bounding errors or 
simulating Turing machines. 


6 CONCLUSION 


We have shown that replicator dynamics in matrix games can simu- 
late universal Turing machines. In continuous time, this observation 
was extended to provide deeper insight into the complexities of 
game theoretic learning. In fact, as highlighted in §4, the plural- 
ity of negative results on game dynamics can be understood as 
a natural byproduct of Theorem 3. Given that the present paper 
uses replicator dynamics specifically and matrix games broadly, 
complimenting the results given here with analyses based on other 
learning dynamics and classes of games could be instrumental in 
guiding future research by finding settings where designing well- 
behaved game dynamics is a tractable problem. As was done for 
Turing machines in computational complexity theory and becomes 
more natural given the techniques used in our analyses, compart- 
mentalizing the complexity of learning in games using traditional 
complexity classes suggests a promising line of investigation for 
finding tractable settings for learning in games. 

In discrete time, the Turing completeness of replicator dynamics 
was used to show that MWU can simulate bounded Turing ma- 
chines. However, our approach does not disallow for the possibility 
of MWU being Turing complete as well; using MWU’s relationship 
to replicator dynamics seems to have inherent numerical limitations 
arising from error growing with time. Since discrete-time learning is 
more applicable in practice, it remains an important open question 
to determine whether MWU and other discrete learning algorithms 
are Turing complete. That being said, the smoothness constraints 
on continuous-time learning often leads to better behaved dynamics 
than discrete-analogues, and thus the study of continuous dynam- 
ics generally serves as restricted special case of what is possible 
in discrete-time. As evidence of this claim, not only are complex 
dynamic phenomena prevalent in low dimensional discrete sys- 
tems where it is impossible in continuous systems (e.g. chaos [7]), 


Replicator Dynamics 


MWU (Step Size = 1.0) 


Figure 1: A comparison of replicator dynamics and MWU on a matrix game derived by [1] to simulate a chaotic dynamical 
system. On the left is replicator dynamics with the dynamics embedded into its behaviours, whereas on the right we have 
10000 iterations of MWU with a relatively large step size. Although not identical, it is clear that MWU retains the intricate and 


complex behaviours of replicator dynamics. 


but Figure 1 demonstrates the robustness of MWU by showing it 
can follow replicator dynamics on a matrix game derived by [1] 
in order to emulate the iconic Lorenz strange attractor. In future 
work, instead of using continuous learning dynamics as a proxy, 
directly simulating Turing machines with discrete dynamics may 
provide powerful tools for learning in games. Research on Turing 
machine simulations using physical systems has a rich history and 
encompasses far more than what is discussed in this paper. Various 
techniques have been used to directly simulate Turing machines 
using discrete dynamics [18, 25], and insights from this prior work 
may hold potent insights for applications to learning in games. 
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A TURING COMPLETE POLYNOMIAL FLOWS 
ON S” 


We will briefly sketch the construction by [5] of the Turing complete 
polynomial vector field in Proposition 1; for a complete treatment 
we refer the reader to [5]. To simplify notation, throughout this 
section we will represent a step in a Turing machine T’s evolu- 
tion (i.e. an iteration of Steps 1-3 in §2.4) by the global transition 
function 
Gr:QxA>QxA, 

where we set G7 (qhalts w) “= (dhalt w) for any tape w. 

Let T = (Q, È, ô, qo, qhalt) be a Turing machine. We begin by 
encoding each configuration of T as a constructible point in R3. 
Let r = |Q| be the cardinality of the set of states Q, then we will 
represent the elements of Q by [r] = {1,...,r} € N. Since we know 
tapes satisfy eq. (5), we can encode any such w = (wj;)jez as the 
pair of points in N? given by 


yı wo +wi10 +--+ + wy, 10% 


y2 = w-1 + w-210+---+ wg 10%" 3 


Taken together, we have an encoding of every (q,w) € Q x A 
as (y1,y2,q) € N’ C RÌ. Define ¢ : Q x A > N? as the map 
assigning each configuration in Q x A its associated point in N? 
that we constructed. The global transition function Gr can now be 
reinterpreted as a map from £(Q x A) c N? to itself. By extending 
said map to be the identity map on points in N’ \ £(Q x A), we 
arrive at a map on the whole of N° to itself—for simplicity, we will 
denote this extended map by Gz(7y : N? > N’. 

Using this encoding, the next step in the construction is to sim- 
ulate T using a polynomial vector field P on R”*3. To this end, a 


modification of a construction by [11] is given. Specifically, [11] 
construct a non-autonomous polynomial vector field that simulates 
T, and this vector field is made autonomous via a standard trick of 
introducing a proxy variable in place of the explicit dependence 
on time. Let P on R”*3 be the autonomous polynomial vector field 
derived via this modification. The construction by [11] also shows 
how, given an input tape s € A, a point ps = (č (qo, s), Jo) € R”*3 
is constructed so that the trajectory of P starting from ps will sim- 
ulate Gry). The term č (qo, s) € R? is defined above and the term 
Yo € R” is from a composition of polynomials depending only 
on T and s—neither of these points are affected by the modifica- 
tion and can be taken as is. The group property of flows ensures 
that any trajectory passing through ps is equivalent to a trajec- 
tory ending at and then “restarting” from ps, so we can assume 
ps is an initial condition in Definition 2 without loss of general- 
ity. Suppose we have a finite string wë = (w*,,...,w;,) of sym- 
bols in X, we will now construct the set U,y* in Definition 2.? Let 
o = {we 22 | w= w; Vi € [-k,k]}, € > 0 be a small positive con- 
stant, and Re anansco) be the set of points in R? corresponding to 


configurations of T of the form (dpa, w € œ). Defining Uf. C RS 
as an e-neighborhood of Rl ea, i) gives the open set 


Uw = US. x R” s 


Showing P satisfies Definition 2 with this choice of ps and Uw 
follows from a relatively straightforward argument using properties 
inherited from the construction by [11]. Finally, the polynomial 
vector field X in Proposition 1 is constructed by using the pullback 
of inverse stereographic projection on a suitable reparametrization 
of P and taking T to be a universal Turing machine. The pullback 
of inverse stereographic projection ensures that X is a polynomial 
vector field tangent to the sphere and the reparametrization ensures 
the vector field is bounded.’ The fact that X is well-defined on 
S17 and has degree 58 follows from an analysis by [12] of the 
construction by [11]. 


B PROOF OF COROLLARY 1 


Coro.iary 1. For some m < (78) + 1, there exists a matrix game 
AéR™*™ such that replicator dynamics on A is Turing complete. 


Proor. Let X, X, and Y be the vector fields defined in the proof 
of Theorem 1. Similarly, let A € R’*’ be the matrix game we 
arrived at by applying Proposition 2 to Y. From Proposition 2 we 
know that m — 1 is at least the number of unique monomials in the 
generalized polynomials in Y, so the proof follows by bounding the 
number of unique monomials from above. 

From Proposition 1 we know that X is a polynomial vector field 
of degree 58. As mentioned in Appendix A, the specific degree of 


2For brevity we will brush over the construction of this set on the component corre- 
sponding to the proxy variable for time. Technically this component should be a union 
of small open intervals for each i € N, which intuitively associates a rough length of 
time in the dynamical system with a step in the Turing machine. However, formally 
introducing this portion of the construction is not particularly insightful since the 
relevance to the proof is rather tautological due to the proxy variable monotonically 
increasing at the same constant rate as time. 


3Technically X = x|, where X is a polynomial vector field on R"*? and tangent to 


n+4’ 
S”*4 Similarly, as discussed in the proof of Theorem 1.3 by [5], the reparametrization 
ensures the vector field is global because it is bounded. 


58 was derived from follow-up work by [12] analyzing the con- 
struction by [11]. However, although the vector field is technically 
constructible, actually constructing X to simulate a universal Tur- 
ing machine is non-trivial in practice. With this complication in 
mind, a crude upper bound on the number of unique monomials 
in X is simply the number of unique monomials of degree 58 in 
18 variables. Therefore, a standard combinatorial argument tells 
us that the number of unique monomials in the polynomials of X 
is at most ea = (9). The construction of X cannot increase 
the number of monomials counted by this combinatorial argument 
since it can only introduce unique monomials via the term 1 — ixl, 
which is already counted in the bound Go: Similarly, we construct 
Y by translating X by a constant and therefore can only introduce 
the constant monomial (i.e. terms with all variables having zero 
exponents) which is already being counted. Thus we have found 
thatm-1 < C ), which implies m < ($) +1. o 
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C DERIVING MWU AS DISCRETE-ANALOGUE 
OF REPLICATOR DYNAMICS 
Let ô : R” — A” be the logit map defined as 


exp(y;) 


sily) = H, 
) È je[n] exPly,;) 


yeR” ie [n]. 


[14] showed that the flow generated by replicator dynamics can be 
written as 
exp(y;(t)) 


xi(t) = ôi(y(t)) = Dje[n] exply;(4)) 


(7) 
where x and y are the mixed strategy and cumulative payoff vectors 
given in eq. (2). Rewriting eq. (7) in the form of eq. (2) gives an 
explicit representation of replicator dynamics’ trajectories as a 
functions of cumulative payoffs, 


t 
vii) =yi0)+ [YY AIG 4 
jeln] 


x;(t) = ôi(y(t)) . 


By applying a standard Euler discretization with step size y to the 
payoffs y in eq. (8), we find 


yilt+n) ~y(0) +19) =y() +0 )) Aiidi(y(O)- 
jeln] 
Finally, iteratively applying this Euler discretization of the cumula- 
tive payoffs and using the logit map will give us the well-known 
MWU algorithm. Formally, denoting the discretization’s t'h iterate 
by y’, we write MWU as 


t 
yit=yptn J, Ayyy? + >) Að” 
jen] =I 


; (9) 
xf"? = ily) = diy? +9) Aig) - 
T=1 


As the form of MWU in eq. (9) was found via an Euler discretiza- 
tion, a standard result in numerical analysis tells us that the error 
accrued by a single iteration of MWU starting from the same initial 


y(0) € Rie [n],teER, 


conditions as replicator dynamics will satisfy 


lyt- yi(n) < O(n?) . 


However, since we are simulating Turing machines in the space of 
mixed strategies, we need error bounds on the probability simplex 
itself and not in the space of cumulative payoffs. 


D PROOF OF LEMMA 1 


LEMMA 1. Let ® be the flow generated by replicator dynamics and 
x! be the mixed strategy found on the t” iterate of MWU. The error 
accrued by a single iteration of MWU with step-size n > 0 is 


lxt = BC, x! Ile < 1-077. 


Proor. Suppose without loss of generality that for any action i 
the expected payoff is bounded to [—1, 1], Le. X jejn] AijOj(y) € 
[-1,1].* Let W(t) = Z je{nj exp(y;(t) and W; (t) = exp(y;(t)) = 
xi(t)W (t). Then continuous time RD becomes 
Wi(t) 

W(t) ` 


xi(t) = 


Similarly, define W! = È jeln] exp(y} ’) and wt = exp(yj') = 
xitwi. Then MWU becomes 
wt 
a “i 

We are interested in bounding the local error of MWU as a 
discretization of RD, i.e. the error introduced by a single step of 
MWU relative to RD after a single step starting from the same point. 
Thus without loss of generality we will focus on the first iterate of 
MWU and RD after t = n amount of time. Since expected payoffs 
are bounded to [—1, 1] we deduce from the analysis in Appendix C 
that 


W? exp(=n) < Wi(n) < W} exp(n) , 
which implies 

W? exp(—7) < W(n) < W! exp(n) . 
Hence 

x} exp(—27) < xi(m) < x} exp(2n) 


whenever RD and MWU start from the same initial condition. 
We have thus found that the local error introduced by a single 
time step is 


lxt — x(1)|| < |x" — x’ exp(—2n)|| < |1 — exp(-27)] . 


Observing that 7 > 0 gives the result. o 


E PROOF OF LEMMA 2 


LEMMA 2. Let ® be the flow generated by replicator dynamics and 
x! be the mixed strategy found on the t” iterate of MWU. The error 
accrued after t + 1 iterations of MWU with step-size n > 0 is 


IIx"*! — &(tn, x°)|| < O (et) . 


4The assumption that expected payoffs are bounded to [—1, 1] does not affect learning 
dynamics since we can always normalize the payoff matrix by its largest element. 


Proor. The flow ® is C! and A” is compact, so we know that 
® is Lipschitz continuous. Let L denote the Lipschitz constant for 
® with respect to || - |loo. It follows that for every initial condition 


xl cA”, 
EM? = IIx? — &((¢ +1), x°)Il 
= |[x*! — &(y, &(¢n,x°))| 
= xt E O(n, x’) + O(, x’) = O(n, O(tn, x°))|| 
< x+! = (n, x*)|| + l(a, x’) “2 (7, (ty, x°))|| 
< |1 = exp(—27)| + e™ lx" — ®(ty, x°)| 
= |1 — exp(—27)| + etl Et 
To conclude our proof, we require a special case of the discrete 
Gronwall lemma. This powerful tool for numerical error analysis 
tells us that if, for some constants a and b with a > 0, a positive 
sequence {z7}!_, satisfies 


T+ 


zl <btaz’, vre [t-1], 


then for a + 1 
E 
-1 
z <a +b? T 


vre [t], 


and fora = 1 
z <z 4+cb, Vr e€ [t]. 
Recall that both 7 > 0 and L > 0 by definition. Let z’ = F*, 

a = eL, and b = |1 — exp(—2n)|. Applying the discrete Gronwall 
lemma yields 

t+1)yL 

E+! < habe 4 |1- acme e me : 

enh —1 
Clearly E? = 0 since MWU and replicator dynamics have the same 
initial conditions. Thus we have shown 

e(ttl)nl _ 4 


<ir = 2p )| — 
< |1 — exp(—2)| —— 


which concludes the proof. o 
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